iT邦幫忙

2025 iThome 鐵人賽

DAY 22
0
生成式 AI

LLM 學習筆記 - 從 LLM 輸入問題,按下 Enter 後會發生什麼事?系列 第 22

Day 22. Training: 從做 LLM 中嘗試預訓練

  • 分享至 

  • xImage
  •  

既然可以計算 Loss 了,接著終於要進到最關鍵的訓練環節,將 dataset 提供給模型,透過迴圈跌代每一個 batch,在每一個 batch 中計算 Loss,並更新模型權重。

def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs,
                       eval_freq, eval_iter, start_context, tokenizer):
    train_losses, val_losses, track_tokens_seen = [], [], []
    tokens_seen, global_step = 0, -1

    for epoch in range(num_epochs):
        model.train()  # Set model to training mode
        
        for input_batch, target_batch in train_loader:
            optimizer.zero_grad()
            loss = calc_loss_batch(input_batch, target_batch, model, device)
            loss.backward() 
            optimizer.step()
            tokens_seen += input_batch.numel()
            global_step += 1
          
        generate_and_print_sample(
            model, tokenizer, device, start_context
        )

    return train_losses, val_losses, track_tokens_seen

從上述過程中可以看到,在透過 backward() 計算完 Loss Gradient 後,透過 step() 跳到下一個學習權重。

def evaluate_model(model, train_loader, val_loader, device, eval_iter):
    model.eval()
    with torch.no_grad():
        train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter)
        val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter)
    model.train()
    return train_loss, val_loss

def generate_and_print_sample(model, tokenizer, device, start_context):
    model.eval()
    context_size = model.pos_emb.weight.shape[0]
    encoded = text_to_token_ids(start_context, tokenizer).to(device)
    with torch.no_grad():
        token_ids = generate_text_simple(
            model=model, idx=encoded,
            max_new_tokens=50, context_size=context_size
        )
    decoded_text = token_ids_to_text(token_ids, tokenizer)
    print(decoded_text.replace("\n", " "))  # Compact print format
    model.train()

同時,我們也需要透過 evaluate_model() 來評估訓練是否真正地改善了模型,他會計算 模型更新後的 loss,而透過另一個 generate_and_print_sample() 來追蹤在訓練過程中的改善狀況。

model = GPTModel(GPT_CONFIG_124M)
model.to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0004, weight_decay=0.1)

num_epochs = 10
train_losses, val_losses, tokens_seen = train_model_simple(
    model, train_loader, val_loader, optimizer, device,
    num_epochs=num_epochs, eval_freq=5, eval_iter=5,
    start_context="Every effort moves you", tokenizer=tokenizer
)

實際執行之後,generate_and_print_sample() 會以比較容易閱讀的方式呈現訓練過程,但更好的是,我們可以透過視覺化的方式來呈現 Loss 降低的過程:

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator


def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
    fig, ax1 = plt.subplots(figsize=(5, 3))

    ax1.plot(epochs_seen, train_losses, label="Training loss")
    ax1.plot(epochs_seen, val_losses, linestyle="-.", label="Validation loss")
    ax1.set_xlabel("Epochs")
    ax1.set_ylabel("Loss")
    ax1.legend(loc="upper right")
    ax1.xaxis.set_major_locator(MaxNLocator(integer=True))

    ax2 = ax1.twiny()
    ax2.plot(tokens_seen, train_losses, alpha=0)
    ax2.set_xlabel("Tokens seen")

    fig.tight_layout()
    plt.savefig("loss-plot.pdf")
    plt.show()

epochs_tensor = torch.linspace(0, num_epochs, len(train_losses))
plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses)

可以發現在非常小的資料集中,現在 model 是過擬合於 training dataset。


上一篇
Day 21. Loss: 從做 LLM 中來看怎麼計算損失
下一篇
Day 23. Randomness: 從做 LLM 中控制生成隨機性
系列文
LLM 學習筆記 - 從 LLM 輸入問題,按下 Enter 後會發生什麼事?24
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言